Optical Character Recognition (OCR) for Telugu: Database, Algorithm and Application

نویسندگان

  • Konkimalla Chandra Prakash
  • Y. M. Srikar
  • Gayam Trishal
  • Souraj Mandal
  • Sumohana S. Channappayya
چکیده

Telugu is a Dravidian language spoken by more than 80 million people worldwide. The optical character recognition (OCR) of the Telugu script has wide ranging applications including education, health-care, administration etc. The beautiful Telugu script however is very different from Germanic scripts like English and German. This makes the use of transfer learning of Germanic OCR solutions to Telugu a non-trivial task. To address the challenge of OCR for Telugu, we make three contributions in this work: (i) a database of Telugu characters, (ii) a deep learning based OCR algorithm, and (iii) a client server solution for the online deployment of the algorithm. For the benefit of the Telugu people and the research community, we will make our code freely available at https://gayamtrishal.github.io/OCR_Telugu.github.io/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-font Optical Character Recognition System for Printed Telugu Text

The Telugu OCR systems available in the market currently recognize only the specific fonts of Telugu. This paper describes the development of a multi-font OCR system for printed Telugu characters using Artificial Neural Networks. In this system classification of the characters is carried out using multi layer neural network Architecture.

متن کامل

An Overview of Optical Character Recognition Systems Research on Telugu Language

This paper gives an overview on the development process and ongoing research of the optical character recognition (OCR) systems for Telugu Text. The aim of this paper is to provide a starting point for the researchers entering into this field. In this paper, we present the introduction, characteristics of the Telugu language, developmental process of the OCR systems of Telugu language, research...

متن کامل

OCR of Printed Telugu Text with High Recognition Accuracies

Telugu is one of the oldest and popular languages of India spoken by more than 66 million people especially in South India. Development of Optical Character Recognition systems for Telugu text is an area of current research. OCR of Indian scripts is much more complicated than the OCR of Roman script because of the use of huge number of combinations of characters and modifiers. Basic Symbols are...

متن کامل

An Approach for Telugu Numeral Recognition by Moment Invariants in Wavelet Transform Domain

Optical Character Recognition (OCR) is the task of recognizing the characters which are present in a digital image of text, whose domain can be machine print or handwriting. OCR is one of the most fascinating and challenging areas of pattern recognition with various practical application potentials. The present paper proposes a method for recognizing the Telugu Numerals from zero to nine by usi...

متن کامل

A Study and Comparative Analysis of Different Stemmer and Character Recognition Algorithms for Indian Gujarati Script

A lot of work has been reported on optical character recognition for various non-Indian scripts like Chinese, English and Japanese and Indian scripts like Tamil, Hindi Telugu, etc. , in this paper, we present a literature review on stemmer, optical character recognition (OCR) and Text mining work on Indian scripts, mainly on the Gujarati languages. We have discussed the different techniques for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.07245  شماره 

صفحات  -

تاریخ انتشار 2017